The type-token relationship in Slavic parallel texts
نویسنده
چکیده
The aim of the paper is to analyse the statistical regulation of the type token relationship in Slavic parallel texts. Furthermore it is shown that this relationship in parallel texts can be explained due to morphological and typological characteristics. Keyords: type-token relationship, Slavic languages, corpus, parallel texts
منابع مشابه
Preliminary Analysis of a Slavic Parallel Corpus
The focus of this paper is on a detailed description of a newlydeveloped parallel corpus of Slavic languages. It consists of 11 Slavic translations of the well-known Russian socialist realist novel “Kak zakaljalas’ stal’/How the steel was tempered” (KZS), written by N.A. Ostrovskij in the years 1932-34. The KZS contains the Slovene, Croatian, Serbian (ekavian), Macedonian, Bulgarian, Ukrainian,...
متن کاملComputational and Linguistic Issues in Designing a Syntactically Annotated Parallel Corpus of Indo-European Languages
This paper reports on the development of the PROIEL parallel corpus of New Testament texts, which contains the Greek original of the New Testament and its earliest IndoEuropean translations, into Latin, Gothic, Old Church Slavic and Classical Armenian. A web application has been constructed specifically for the purpose of annotating the texts at multiple levels: morphology, syntax, alignment at...
متن کاملOn the dependency of word length on text length. Empirical results from Russian and Bulgarian parallel texts
This paper tackles two basic problems of quantitative linguistics: firstly the “word length” and secondly the text length in terms of type and token numbers. It has to be shown that these two basic properties of a text are directly related. The interrelation between word length and text length can be captured by an appropriate mathematical model; hence a law-like status of the interrelation bet...
متن کاملAn Evaluation Exercise for Word Alignment
This paper presents the task definition, resources, participating systems, and comparative results for the shared task on word alignment, which was organized as part of the HLT/NAACL 2003 Workshop on Building and Using Parallel Texts. The shared task included Romanian-English and English-French sub-tasks, and drew the participation of seven teams from around the world. 1 Defining a Word Alignme...
متن کاملLanguage Related Issues for Machine Translation between Closely Related South Slavic Languages
Machine translation between closely related languages is less challenging and exhibits a smaller number of translation errors than translation between distant languages, but there are still obstacles which should be addressed in order to improve such systems. This work explores the obstacles for machine translation systems between closely related South Slavic languages, namely Croatian, Serbian...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Glottometrics
دوره 20 شماره
صفحات -
تاریخ انتشار 2010